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ABSTRACT 


Determining  Neural  Network  Connectivity 
using  Evolutionary  Programming 

John  R.  McDonnell  and  Don  Waagen 

NCCOSC.  RDT&E  Div. 

San  Diego.  CA  92152-5000 


This  work  investigates  the  eqrpUcation  of  evolutionary 
programming,  a  stodiastic  search  technique,  for 
determining  connectivity  in  feedforward  neural  networks. 
The  method  is  capable  of  sirrudtaneously  evolving  both 
the  connection  scheme  and  the  network  weights.  The 
number  of  synapses  are  incorporated  into  an  objective 
function  so  that  network  parameter  optimization  is  done 
with  respect  to  a  connectivity  cost  as  well  as  mean 
pattern  error,  Ejqrerimental  results  are  shown  using 
feedforward  networks  for  simple  binary  mapping 
problems. 

INTRODUCTION 

The  neural  network  design  process  is  largely  based  on 
heuristi(».  Previous  experience  (or  the  work  of  other 
researchers)  often  dictates  an  initial  network 
configuration  for  the  problem  at  hand.  If  the  network 
can  be  trained  to  achieve  the  designer’s  goals,  the  design 
process  is  terminated.  If  success  is  not  attained,  a 
testing  phase  ensues  and  is  largely  trial  and  error.  The 
result  can  often  be  a  network  with  excess  parameters  and 
little  regard  for  con^rutational  costs. 

In  this  research,  a  cormectivity  cost  associated  with  the 
neural  network  configuration  is  incorporated  into  the 
optimization  procedure  in  an  effort  to  reduce  the  number 
of  synapses.  An  optimized  architecture  offers  increased 
throughput  for  real-time  signal  processing  applications  as 
well  as  decreased  memory  requirements. 

Simultrmeously  determining  both  network  parameters  and 
structure  requires  a  search  procedure  which  is  amenable 
to  combinatorial  optimization.  The  more  successful 
algorithms  for  these  types  of  problems  have  generally 
been  stochastic  search  techniques  such  as  simulated 
armealing',  genetic  algorithms^,  and  simulated  evolution’. 


The  simulated  evolution,  or  evolutionary  programming 
(EP),  paradigm  has  been  shown  to  have  the  desired 
attributes:  combinatorial  optimizatirm  capat^tied*,  die 
ability  to  determine  model  structure’,  and  the  ability  to 
train  neural  networksl*. 

The  premise  of  die  current  resean^  is  that  near  minimal 
size  neural  network  architectures  can  be  evolved  under  an 
objective  function  vdiich  incorporates  both  neural  network 
connectivity  and  weight  parameters.  Further,  the 
proposed  iqiproach  takes  advantage  of  computational 
resources  during  the  design/training  phase  ther^y 
removing  the  burden  of  evaluation  by  trial-and-error  from 
the  designer.  For  purposes  of  discussion.  Fig.  1 
illustrates  the  structure  of  a  hypothetically  evolved  neural 
network  where  the  connectivity  between  noirons  is 
determined  via  a  multi-agent  stochastic  search  technique. 
Nodes  which  are  not  connected  can  be  pruned.  This  work 
extends  previous  research  in  evolving  n«iral  network 
architectures  (where  both  the  number  of  neurons  and 
connectivity  are  stochastically  determined  using  EP’)  by 
investigating  an  alternative  strategy  to  evolving  neural 
network  cormectivity. 

Similar  work  has  been  undertaken  by  Bomholdt  and 
Graudenzf  using  genetic  algorithms  to  determine  both 
network  structure  and  parameters.  Due  to  the  generality 
of  their  implementation,  recurrent  networks  can  result 
requiring  multiple  sweeps  to  reach  a  stable  state.  The 
approach  investigated  in  this  work  is  limited  to 
feedforward  networks.  The  EP  paradigm  is  outlined  in 
the  next  section  along  with  its  application  to  training 
neural  networks.  This  training  method  is  then  augmented 
so  that  the  connectivity  between  layers  can  be  randomly 
determined  to  yield  a  structure  similar  to  that  shown  in 
Fig.  1.  Finally,  training  results  are  given  for  simple 
binary  mapping  problems. 


Determining  Network  Weights  with  EP 


wiih  variable  coarucAvity. 

APPLYING  EP  TO  NEURAL  NETS 
Evolutionary  Programming 

Evolutionary  programming  is  a  neo-Darwinian  search 
paradigm  suggested  by  Fogel  et  al?  This  stochastic 
search  method  is  typically  utilized  as  a  global  optimizer. 
EP  has  been  successfully  applied  to  a  variety  of 
optimization  problems  including  the  traveling  salesman 
problem^,  parameter  estimation  and  system 
identification^  and  neural  net  training*. 

The  EP  optimization  algorithm  can  be  described  by  the 
following  steps: 

1.  Form  an  initial  population  Pm-  ,(x)  of  size  2N.  The 
parameters  X  associated  with  parent  element  P,  are 
randomly  initialized  from  a  user  specified  search  domain. 

2.  Assign  a  fitness  score  to  eadt  element  P,(x)  in 
the  population. 

3.  Reorder  the  population  based  on  the  number  of  wins 
generated  from  a  stochastic  competition  process. 

4.  Generate  offspring  (P^  ....  Pm-,)  of  the  highest 
ranked  N  elements  (Pg  ....  in  the  population  by 
perturbing  x. 

5.  Loop  to  step  2. 

In  addition  to  providing  a  systematic  means  of  stochastic 
search,  the  generality  of  the  EP  optimization  algorithm 
lends  power  to  its  implementation.  The  user  is  not 
bound  to  any  particular  coding  structure  nor  mutation 
strategy.  EP  is  used  in  this  investigation  since  it  is  well 
suited  for  simultaneously  evolving  both  model  structure 
and  parameters. 


Evolutionary  programming  can  be  used  for  training  neural 
networks.  The  selected  objective  function  is  the  same  as 
that  used  in  backpropagation;  minimize  the  sum-squared 
error  function  E  =  -o^f  over  all  patterns  p  for 

k  output  neurons.  The  EP  algorithm  given  in  the  previous 
section  is  applied  to  determining  neural  network  weights 
and  then  results  are  shown  for  sample  training  runs  using 
various  scaling  factors  on  the  XOR  mapping  problem. 

Initially,  a  population  consisting  of  2N  feedforward 
networks  is  generated.  Each  network  in  the  population  is 
r^resented  by  a  multidimensional  weight  array  with 
weights  initially  chosen  from  a  Uf-0.5,  0.5]  distribution. 
Next,  a  cost  is  assigned  to  each  network  in  the  population. 
This  cost  is  typically  the  mean  of  the  sum-squared  pattern 
error  E  previously  discussed.  The  "best"  N  members  of 
the  population  generate  offspring  (perturbed  weight  sets) 
according  to  W^=W^+BW^  where  6W^  is  N(0,  S^-E^  with 
a  scaling  coefficieot  Sf  and  mean  sum-squared  pattern 
error  £,  for  each  parent  network.  The  scaling  factor  is  a 
probabilistic  analog  to  the  st^size  used  in  gradient 
descent  methods  and  may  also  be  treated  as  a  random 
variable  within  the  EP  search  strategy’.  The  effect  of  the 
scaling  factor  is  shown  in  Fig.  2  for  the  XOR  mapping. 
The  variance  of  the  weight  perturbations  is  bound  by  the 
total  system  error  in  this  application.  To  emulate  the 
probabilistic  nature  of  survival,  a  ptairwise  competition  is 
held  where  individual  elements  compete  against  randomly 
chosen  members  of  the  population.  For  example,  if 
network  is  randomly  selected  to  compete  against 
network  ^j,  a  win  is  awarded  to  network  if  E,  <  Ej. 
The  N  networks  with  the  most  "wins"  are  kept  and  the 
process  is  repeated. 


Figure  2.  EP  training  of  a  2-2-1  XOR  mapping  network 
for  various  scaling  factors. 


EVOLVING  CONNECTIVITY 


This  section  investigates  structural  level  adaptation  within 
the  EP  search.  The  objective  function  has  been 
modified  to  be  a  linearly  weighted  combination  of  the 
number  of  connections  and  the  mean  sum-squared 
pattern  error 


7  =  aE  + 

A  heuristic  wdiich  might  be  employed  would  be  to  let 
otEa,/N„  thereby  incorporating  the  desired  training 
error  and  the  maTimnm  possible  number  of  connections 
to  reasonably  weight  the  cost  associated  with  the 
evolved  number  of  connections. 

Analogous  to  the  weight  array,  a  connectivity  array  has 
been  q>eciiied  wdiere  (one  of  its  elements)  c  =  i  if  a 
connection  musts  or  c  =  0  if  no  connection  is  present. 
A  connectivity  array  that  has  all  of  its  elemmits  set  to  1 
yields  a  fiilly-oonnected  feedforward  network.  The 
designer  must  qjecify  the  number  of  hidden  neurons  over 
which  the  search  is  conducted.  This  determines  the 
maximum  number  of  cormections.  In  previous’  work^,  a 
synapse  was  randomly  chosen  from  the  range  of  possible 
connections  and  modified  based  on  its  current  state. 
That  is,  discormected  synapses  were  coimected  and 
connected  syn^ises  were  disconnected.  The  number  of 
connections  which  may  be  affected  at  each  mutation  is 
arbitrarily  set  by  the  designer  or  may  even  be  determined 
in  a  random  fashion.  The  coimectivity  array  is 
incorporated  in  the  neuron  output  dot  product  term 
thereby  nulling  any  signals  between  disconnected 
neurons.  Weights  are  continually  modified  in  the  event 
that  a  neuron  pair  is  recormected. 

In  order  to  place  an  emphasis  on  signal  propagation 
through  the  network,  a  strategy  for  connection  and 
modification  has  been  developed  based  on  the  activity 
levels  of  a  neuron.  This  strategy  assigns  a  probability  of 
connection  P,  to  the  connection  between  neuron  j  in 
layer  I  and  neuron  /:  in  layer  l+l,  C^,  ba^  upon  the 
variance  in  neuron  j ’s  output  over  all  of  the  patterns 
in  the  training  set 
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where  the  variance  for  neuron  j  in  layer  /,  Uj,  is 
determined  from  the  activation  or  output  levels  over 
the  number  of  patterns 
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Neurons  which  have  high  variance  on  their  activation 
levels  will  tend  to  be  coimected  to  other  neurons.  This 
may  also  be  viewed  as  promoting  connections  from 
neurons  (which  are  essentially  hyperplanes)  that  provide 
a  measure  of  discrimination  on  the  feature  space. 
Neurons  which  have  low  variance  on  their  activation 
levds  correspond  to  hyperplanes  wdiich  separate  few  data 
points,  and  thereby  provide  little  information  to  the 
network. 

Synapses  are  randomly  chosen  as  candidates  for 
modification.  If  a  chosen  synapse  is  not  connected  then 
it*s  probability  of  becoming  connected  is  evaluated. 
Conversely,  die  probability  of  disconnection  is  calculated 
if  the  syiuqise  is  connected.  If  all  of  the  synapses  were 
candidates  for  mutation  during  each  generation,  this  would 
be  a  self-fulfilling  strategy  where  a  single  neuron  would 
dominate.  However,  only  a  small  number  of  randomly 
chosen  syiuqises  are  evaluatated  for  mutation  when 
generating  an  offspring  network.  The  probabilities  of 
connection  or  disconnection  are  determined  according  to 
the  neuron’s  class.  The  three  classes  ate  self-evident  with 
the  corresponding  connection  strategies  as  follows; 

hidden  unit  neuron:  The  probability  of  connection  is 
os  calculated  above,  and  the  probability  of 
disronnection  is  determined  as  P/C^  —  (I  -  Pc(f^^) 

input  neuron:  For  the  binary  mappings  used  in  these 
studies,  the  variance  on  the  input  units  is  constant  and  the 
strategy  used  for  the  input  neurons  reduces  to  a  uniform 
probabilistic  connection  strategy  previously  used^.  To 
promote  coupling  effects  between  neurons,  the  coimection 
probabilities  are  multiplied  yielding  the  probability  of 
connection  P/(C^=  P/C^*P/Cf,^.,)J.  The 

discoimection  probability  is  determined  in  a  similar 
fashion  according  to  P/(C^~  P/C^  * 

bios  neuron:  Since  the  bias  neurons  are  invariant,  the 
probability  of  coimecting  to  any  neuron  is  contingent  upon 
the  variance  in  the  activation  levels  of  that  neuron.  As  a 
result,  the  probability  of  cormecting  to  a  given  neuron  is 
simply  the  probability  of  that  neuron  coimecting  to  the 
next  layer.  Thus  the  coimection  probability  of  a  bias 
neuron  can  be  given  by  =  P/C„^,^).  Likewise, 

its  disconnection  probability  is  given  by  P/C^J  = 
P d^a*nu)- 


RESULTS 

Experiments  were  conducted  with  N=10  parent 
networks,  a=l,fi=0.001,  and  Sf=100  for  the  XOR  and 
3-bit  parity  nuqipings  using  the  probabilistic  cormection 
criteria  discussed  above.  The  networks  were  initialized 
with  a  random  cormection  strategy  as  opposed  to  initially 
being  fully  connected.  Figures  3-6  show  the  results  of 
using  this  criteria  for  the  XOR  nuqiping  with  8  hidden 
units.  Figures  7  and  8  show  an  example  of  an  evolved 
network  for  the  3  bit  parity  problem  with  16  hidden 
units. 

As  a  measure  of  the  sparseness  of  a  network’s 
connectivity,  a  dilution  ratio  has  been  dehned*  as 
D=S/N^  where  S  is  the  total  number  of  syrupses  and  N 
is  the  number  of  neurons.  The  dilution  ratio  for  the 
evolved  networks  is  given  in  Figures  4,6,  &  8, 
reqrectively.  The  stochastic  nature  of  the  search  process 
makes  the  advent  of  two  exact  evolved  configurations 
highly  tmlikely.  This  is  not  to  say  that,  once  the  imtrsed 
neurons  are  discarded,  the  networks  will  always  be 
dissimilar. 


Since  only  binary  mapping  problems  were  investigated,  it 
is  not  clear  how  the  approach  given  in  this  study  will 
work  on  classification  or  continuous  mapping  problems. 
Nevertheless,  stochastic  training  techniques  are  becoming 
prevalent  in  neurocomputing  (especially  in  hardware 
implementations^.  Ditring  these  investigations,  issues  in 
orthogonrd  learning  (search)  rmd  populatirm  dynamics 
became  prevalent.  These  topics  are  being  addressed  in 
futirre  work*®. 
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Figure  5.  Endnng  the  connectivity  for  the  XOR 
mapping.  See  final  architecture  at  right. 


Figure  4.  The  final  evolved  connectiHty  for  the 
mapping  network,  D  -  0.111. 


o 


Figure  6.  The  final  evolved  conructivity  for  the  XOR 
mapping  network,  D  =  0.129. 


Figure  7.  Evolving  conructivity  for  the  3  bit 
parity  mapping.  See  finai  architecture  at  right. 
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Figure  8.  The  final  evolved  conructivity  for  the  3  bit 
parity  mapping  network,  D  =  0.066. 


